BMC Medical Genomics
○ Springer Science and Business Media LLC
Preprints posted in the last 90 days, ranked by how well they match BMC Medical Genomics's content profile, based on 36 papers previously published here. The average preprint has a 0.05% match score for this journal, so anything above that is already an above-average fit.
Smith, C.; Peter Durairaj, R. R.; Randall, E. L.; Aston, A. N.; Heraty, L.; Elsayed, W.; Murillo, A.; Dion, V.
Show abstract
The expansion of short tandem repeats is a feature of over 60 different human diseases. Ongoing somatic instability throughout a patients lifetime can influence disease progression and has emerged as a therapeutic target. Understanding its mechanism is essential for the identification of both drug targets and therapeutic interventions. A major obstacle towards this translational goal has been to measure changes in repeat size distribution in a timely manner. To address this, here we present Single Clone-based Instability Assay (SCIA), a streamlined experimental design that saves weeks in assessing the effect of a gene knockout on repeat instability. The approach avoids bulk cultures and does not require a reporter cell line. It uses targeted long-read sequencing as a readout for repeat instability. We have validated the approach using FAN1, PMS1, and MLH1 knockouts in HEK293-derived cells. We provide a visualization software that generates delta plots, extracts the instability frequency, the bias towards expansion or contraction, and the average size of the changes. Using SCIA, we find that although FAN1 knockout clones showed increased frequency of expansions, the size of the expansions were smaller. This highlights the wealth of information that can be extracted and the potential for novel insights into the mechanism of repeat instability.
Muneeb, M.; Ascher, D.
Show abstract
Identifying disease-associated genes enables the development of precision medicine and the understanding of biological processes. Genome-wide association studies (GWAS), gene expression data, biological pathway analysis, and protein network analysis are among the techniques used to identify causal genes. We propose a machine-learning (ML) and deep-learning (DL) pipeline to identify genes associated with a phenotype. The proposed pipeline consists of two interrelated processes. The first is classifying people into case/control based on the genotype data. The second is calculating feature importance to identify genes associated with a particular phenotype. We considered 30 phenotypes from the openSNP data for analysis, 21 ML algorithms, and 80 DL algorithms and variants. The best-performing ML and DL models, evaluated by the area under the curve (AUC), F1 score, and Matthews correlation coefficient (MCC), were used to identify important single-nucleotide polymorphisms (SNPs), and the identified SNPs were compared with the phenotype-associated SNPs from the GWAS Catalog. The mean per-phenotype gene identification ratio (GIR) was 0.84. These results suggest that SNPs selected by ML/DL algorithms that maximize classification performance can help prioritise phenotype-associated SNPs and genes, potentially supporting downstream studies aimed at understanding disease mechanisms and identifying candidate therapeutic targets.
Muneeb, M. -; Ascher, D.; Myung, Y.; Feng, S.; Henschel, A.
Show abstract
Genotype-phenotype prediction plays a crucial role in identifying disease-causing single nucleotide polymorphisms and precision medicine. In this manuscript, we benchmark the performance of various machine/deep learning algorithms and polygenic risk score tools on 80 binary phenotypes extracted from the openSNP dataset. After cleaning and extraction, the genotype data for each phenotype is passed to PLINK for quality control, after which it is transformed separately for each of the considered tools/algorithms. To compute polygenic risk scores, we used the quality control measures for the test data and the genome-wide association studies summary statistic file, along with various combinations of clumping and pruning. For the machine learning algorithms, we used p-value thresholding on the training data to select the single nucleotide polymorphisms, and the resulting data was passed to the algorithm. Our results report the average 5-fold Area Under the Curve (AUC) for 29 machine learning algorithms, 80 deep learning algorithms, and 3 polygenic risk scores tools with 675 different clumping and pruning parameters. Machine learning outperformed for 44 phenotypes, while polygenic risk score tools excelled for 36 phenotypes. The results give us valuable insights into which techniques tend to perform better for certain phenotypes compared to more traditional polygenic risk scores tools.
Cherchi, I.; Orlando, F.; Quaini, O.; Paoli, M.; Ciani, Y.; Demichelis, F.
Show abstract
1The T2T-CHM13v2.0 reference genome added previously uncharacterized genomic sequences and improved the accuracy of repetitive stretches compared to former human genome assemblies. By comprehensive allelic variation analysis and read mapping statistics from sequencing reads aligned to hg38 and T2T-CHM13 assemblies in samples encompassing different sequencing designs and ethnicity groups, we observed that T2T-CHM13v2.0 assembly significantly reduces the reference mapping bias (RMB) and increases read mapping precision at clinically relevant sites, including BRCA1 pathogenic variants. Further, we report the presence of sequence dissimilarities among reference genomes in the proximity of ClinVar annotated variants, suggesting the need for data re-analysis and potential redesign of probes targeting clinically relevant regions. Overall, these findings support the implementation of T2T-CHM13 reference for the improvement of sequencing data analyses in the clinical genomic setting.
Lenihan-Geels, F.; Proft, S. A.; Bommer, M.; Heinemann, U.; Seelow, D.; Opitz, R.; Krude, H.; Schuelke, M.; Malecka, M.
Show abstract
Transcription factors recognise and bind specific DNA sequence patterns in promoters and enhancers thereby regulating gene expression. Variations in the DNA sequence of transcription factor binding sites (TFBSs) can alter gene regulation and may disrupt development. The transcription factor NKX2.1 is a crucial regulator of thyroid, lung, and neural development. Mutations in its coding gene NKX2-1 may cause choreoathetosis and congenital hypothyroidism with or without pulmonary dysfunction (CAHTP, OMIM #610978). Most genetically solved patients carry mutations in the coding regions of NKX2-1 that affect DNA binding, while the majority of patients with CAHTP-like symptoms do not carry mutations in the NKX2-1 coding sequence. We hypothesise that variations in the DNA-sequence at promoter or enhancer sites to which the transcription factor NKX2.1 binds could cause disease as well. We employed EMSA-seq to quantify the effects of genetic variation on NKX2.1 binding strength and used this data to train neural network models to forecast the influence of DNA variation on NKX2.1 binding. We validated our models using microscale thermophoresis, X-ray crystallography, and publicly available ChIP-seq data sets. The neural networks were able to detect TFBSs in ChIP-seq data and can thus be used to evaluate whole genome sequencing data of CAHTP-patients in order to prioritise potential disease-causing variants in regulatory elements. Graphical abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=80 SRC="FIGDIR/small/708450v2_ufig1.gif" ALT="Figure 1"> View larger version (31K): org.highwire.dtl.DTLVardef@167cedeorg.highwire.dtl.DTLVardef@3e5291org.highwire.dtl.DTLVardef@19eb7f9org.highwire.dtl.DTLVardef@1404057_HPS_FORMAT_FIGEXP M_FIG C_FIG
Chau, K.; Allison, K.; Braithwaite, T.; Harley, I.; Hassman, L. M.
Show abstract
ObjectiveTo determine whether uveitis shares genetic similarity with extraocular immune-mediated inflammatory diseases (IMIDs), we performed network analysis of putative causal genes associated with ocular inflammatory disease, IMIDs and eye-specific diseases, including age-related macular degeneration and monogenic disorders. MethodsWe identified putative causal genes for genome-wide significance variants from uveitis, IMIDs and ocular diseases using OpenTargets and published studies. To assess the gene-level pleiotropy between disease groups, we quantified the causal gene overlap between groups, and the Jaccard Similarity Indices for individual disease pairs. We then used a network approach to assess the molecular genetic similarity between diseases at a biological pathway level and comparative statistics to identify diseases with greater network similarity to uveitis. ResultsSeventy-five percent of the putative causal genes for uveitis are also causal for IMIDs, while no uveitis genes are shared with primary ocular disorders. Network analysis revealed that 1) uveitis genes are more closely networked with systemic IMIDs disease genes than with ocular-specific disease genes; and 2) significant network similarity links uveitis and specific IMIDs, such as ankylosing spondylitis and sarcoidosis. ConclusionsOverlapping causal genes and network similarity indicate that uveitis is predominantly an inflammatory disease, sharing genetic architecture with other IMIDs. Future studies aimed at dissecting genetic heterogeneity within uveitis may determine whether subgroups share common immune pathways that could nominate endotype-specific therapeutic approaches.
Louw, N.; Makay, P.; Mpangase, P.; Naicker, T.; Yates, L.; Honey, E.; Mbungu, G.; Van Den Bogaert, K.; Firth, H.; Hurles, M.; Lukusa, P.; Devriendt, K.; Krause, A.; Carstens, N.; Lumaka, A.; Lombard, Z.
Show abstract
Copy number variants (CNV) contribute significantly to the pathogenic variation associated with developmental disorders. CNV detection is often not included in standard exome sequencing (ES) analysis. Complementary methods such as chromosomal microarray are typically offered in diagnostic laboratories to diagnose pathogenic CNV. In this study, we aimed to develop an optimal approach for incorporating CNV detection within our ES analysis process for the Deciphering Developmental Disorders in Africa (DDD-Africa) cohort. We analyzed ES data from 505 probands with a developmental disorder, applying a CNV detection approach that assessed data generated using the tools CANOES and XHMM. When available, parental ES data was used to assess inheritance patterns. We confirmed a diagnosis in 42/505 (8,3%) patients with 44 pathogenic CNV identified in the probands. There were 31 deletions and 13 duplications. Among the 27 probands with parental data, all identified CNV were de novo. The addition of CNV analysis to our ES analysis pipeline resulted in an 8.3% increase in diagnostic yield in the DDD-Africa cohort without additional laboratory cost. This approach offers a feasible approach which is likely to reduce analytical cost and is suitable for low- and middle-income countries where funding and resources for genomic medicine initiatives are limited.
Song, S.; Zong, Y.; Xu, Y.; Chen, L.; Zhou, Y.; Chen, L.; Li, G.; Xiao, T.; Huang, M.
Show abstract
BackgroundKawasaki disease (KD) is a pediatric systemic vasculitis in which T-cell-mediated immune responses play a pivotal role. However, the precise dynamic evolution of T-cell subsets during disease progression remains poorly understood. MethodsSingle-cell RNA sequencing (scRNA-seq) was employed to perform high-resolution annotation of peripheral blood mononuclear cells (PBMCs) from healthy controls and KD patients, both pre- and post- IVIG treatment. T-cell developmental trajectories were reconstructed via Monocle3-based pseudotime analysis. Furthermore, the functional significance of the significant pathway was validated in a CAWS-induced KD murine model. ResultsA high-resolution single-cell landscape identified 13 distinct T-cell subtypes. Pseudotime analysis revealed a significant lineage commitment of CD4+ T cells toward a Th17 phenotype during the acute phase of KD, synchronized with the transcriptional upregulation of the STAT3/JAK signaling axis. Animal experiments further demonstrated that pharmacological inhibition of this pathway substantially attenuated inflammatory infiltration in the cardiac vasculature of KD mice. ConclusionThis study identifies the STAT3/JAK-mediated Th17 differentiation bias as a potential regulatory program associated with acute inflammation in Kawasaki disease, thereby highlighting the STAT3/JAK axis as a potential therapeutic target.
Soler-Saez, I.; Galiana-Rosello, C.; Grillo-Risco, R.; Falony, G.; Tepav?evi?, V.; Vieira Silva, S.; Garcia-Garcia, F.
Show abstract
Biological sex is a key determinant in the onset and progression of multiple diseases. In multiple sclerosis (MS), females exhibit higher disease prevalence, earlier onset, and more pronounced inflammatory activity, whereas males tend to experience a more severe neurodegenerative course, characterized by accelerated central nervous system damage and increased brain atrophy. The gut microbiome has emerged as a critical factor in MS, as its composition can either ameliorate or exacerbate disease progression. In this study, we aimed to identify reproducible sex-associated differences in gut microbial composition across independent cohorts of MS patients. Through a systematic search we identified six independent studies based on 16S rRNA gene sequencing, comprising a total of 337 samples. Despite substantial inter-study variability, sex-associated differences were more pronounced in MS patients than in healthy controls. We identified 11 microbial taxa showing significant sex-associated differences in MS, nine enriched in females and two in males. Notably, the female-enriched taxa Eggerthella and Eisenbergiella were associated with specific MS subtypes and higher disability. To facilitate the use of our findings by the scientific community, we developed a freely accessible web-based tool that provides full access to our results. Thus, in this work we identified consistent and reproducible sex differences in the gut microbiota of MS patients, highlighting the importance of incorporating sex as a critical variable in microbiome research, with potential implications for understanding disease heterogeneity in MS. IMPORTANCEMultiple sclerosis (MS) affects females and males differently, but the biological reasons behind these differences are not fully understood. One potential factor is the gut microbiome (i.e., the community of microorganisms living in our intestines) which can influence immune function and disease progression. In this study, we analyzed data from multiple independent cohorts and found consistent differences in gut microbial composition between female and male MS patients. Notably, certain bacteria were more abundant in females and were linked to more severe disease features. We also developed a freely accessible web tool where researchers can explore the complete findings in detail. Our results highlight the importance of considering sex as a key factor in microbiome research and may help guide more personalized approaches to understanding and treating MS.
Spencer, D.; Liu, X.; Mosema-Be-Amoti, K.; Kandosi, G.; Bramble, M. S.; Munajjed, F. A.; Likuba, E.; Okitundu-Luwa E-Andjafono, D.; Tshibambe, L.; Colwell, B.; Howell, K.; O'Brien, N.; Moxon, C.; Anwar, S. M.; Porras, A. R.; Ngoyi, D. M.; Vilain, E.; Tshala-Katumbay, D.; Linguraru, M. G.
Show abstract
BackgroundSickle cell disease (SCD) is a common inherited genetic disorder and contributor to global childhood mortality and morbidity. In the Democratic Republic of the Congo, nearly 40,000 newborns, approximately 2% of all newborns, are estimated to be affected each year. Despite progress in the treatment and care of the disorder, its detection and management in lower-resource settings remain challenging. MethodsWe collected 308 front facing photos of patients and their age-and sex-matched controls aged from 5 months to 19 years in the Democratic Republic of the Congo. Facial features were extracted and categorized into geometric and texture-based descriptors. A support vector machine ranked features according to their relevance for distinguishing SCD patients from controls. ResultsThe facial analysis algorithm identified eight geometric and six texture discriminative features that were significantly different between the cohorts. An explainable machine learning model identified sickle cell disease with 79.5% accuracy using a combination of six geometric features: distance between medial and lateral canthi, angle at nasal ala, distance from nasion to philtrum, distance from medial canthi to the columella, distance from columella to the lower lip, and distance between nasal alae. SCD related features were identified to become increasingly discriminative with age. ConclusionThese findings demonstrate the potential machine learning based methodologies to be leveraged to inform point-of-care tools in the screening and management of sickle cell disease. The discriminative facial features identified here may provide further opportunities into Artificial-Intelligence based diagnostics and personalized care strategies of sickle cell disease.
Queme, B.; Muruganujan, A.; Ebert, D.; Mushayahama, T.; Gauderman, W. J.; Mi, H.
Show abstract
BackgroundAccurate single-nucleotide polymorphism (SNP) annotation is central to genomic research yet widely used tools and gene models often yield divergent results. Prior studies have shown such discrepancies in small datasets, but the extent of genome-wide variation and its impact on downstream pathway analysis remain unclear. ResultsWe conducted a comprehensive comparison of three commonly used SNP annotation tools, ANNOVAR, SnpEff, and VEP, using both Ensembl and RefSeq gene models to evaluate more than 40 million SNPs from the Haplotype Reference Consortium. At the protein level, annotation output differed significantly across tools and gene models (p-adj < 0.001), with discrepancies present in both genic and intergenic regions. RefSeq produced broader annotation coverage, particularly for intergenic SNPs, while Ensembl showed greater internal consistency. SnpEff provided the most complete coverage overall, whereas no single tool or model configuration achieved full annotation recovery of the union reference. Integration across tools and models maximized coverage and reduced annotation loss. In a case study of 204 colorectal cancer-associated SNPs from the FIGI GWAS, pathway enrichment results varied depending on annotation strategy. The fully integrated approach identified all four significant pathways, whereas several single-tool or single-model strategies missed one or more. ConclusionSNP annotation outcomes are influenced by both the tool and gene model used, and relying on a single approach may result in incomplete coverage. A multi-tool, multi-model strategy provides the most comprehensive annotation and preserves enriched pathways, supporting more robust and reproducible genomic interpretation.
Manousopoulou, A.; White, C. H.; Hamal, S.; Nihalani, R.; Budoff, M. J.; Garbis, S. D.
Show abstract
BACKGROUNDAs a GLP1 R agonist, semaglutide is known to exhibits pleiotropic health effects across the cardiometabolic spectrum in patients with type 2 Diabetes Mellitus (T2D). However, in depth and unbiased protein and phosphoprotein level evidence that reflects such effects of semaglutide in plasma remains elusive. OBJECTIVESThis pilot study applied an innovative plasma proteomics and phosphoproteomics technology to a sub-set of patients with T2D that participated in the Semaglutide Treatment effect on coronary atherosclerosis Progression (STOP) randomized trial. The aim of this study was to identify the systemic effects of semaglutide treatment in pathways that underpin its pleiotropic cardiometabolic spectrum health benefits. METHODSThe study applied a proprietary liquid biopsy discovery proteomics platform and its derivative cardiometabolic spectrum database (International patent PCT/US2021/063407) to 16 patients from the STOP randomized trial. Plasma samples from 8 patients in the active group and 8 patients in the placebo group at baseline and 52 weeks post treatment were analyzed. The methodology entailed the use of a unique liquid fixative chemistry to instantly solubilize and stabilize plasma proteins and phosphoproteins at room temperature followed by their direct microflow monolithic partition chromatography, dialysis purification, solution phase proteolysis, multiplex isobaric stable isotope labeling of proteotypic peptides, lab-on-chip TiO2/ZrO2 phosphopeptide enrichment and nanotechnology enhanced ultra-high resolution LC-MS analysis. To identify differentially abundant proteins (DAPs) and phosphoproteins (DApPs) in patients treated with semaglutide vs. placebo, the respective abundance ratio for each was considered repectively. Ratios were log2-transformed to normalize their distribution. A one-sample T-Test (paired) using the two-stage Benjamini Yekutieli Krieger step up method for multiple hypothesis testing FDR correction of the p-value was performed. The threshold of significance was set at q[less double equals]0.05. DAPs/DApPs were corrected for placebo. The expressed proteome and phosphoproteome were further interpreted with a multifactorial computational biology pipeline to deconvolute their underlying protein-level molecular pathways and their networks along with transcriptional factors and kinases that regulate them. RESULTSThis study achieved an extremely high depth-and-breadth in quantitative proteome and phosphoproteome coverage from only 20{micro}L whole plasma equivalent from each patient. Namely, a total of 13,173 proteins and 25,578 phosphopeptides were fully profiled (q[less double equals] 0.05). Of these, 1,040 were differentially abundant proteins (DAPs) and 1,064 were differentially abundant phosphoproteins (DApPs), in the semaglutide treated group after correcting for placebo, at an absolute log2-fold-change of [greater double equals] 0.5, CV [less double equals] 15%, q[less double equals] 0.05. Of interest, this study profiled over 85% of all proteins/phosphoproteins (6,700) reported to date based on the use of the well curated and up-to-date PaxDB database. Over 70% of these known proteins were of exosomal origin. Importantly, an additional [~]9000 plasma proteins and phosphoproteins of this study constituted entirely novel observations. Contextualization of the DAPS, DApPs, kinases and transcription factors for all statistically significant enriched canonical pathways (FDR-corrected q<0.001) revealed a wide array of pathophysiological processes attributed to semaglutide treatment. Furthermore, these pathways provided a molecular understanding to the reported imaging biomarkers against the same STOP trial patients. CONCLUSIONSThis feasibility study demonstrated how an effective plasma proteomics and phosphoproteomics platform can generate a treatment-adaptive companion diagnostic molecular signature that holistically captures the multiple cardiometabolic spectrum health effects of semaglutide in patients with atherosclerosis and T2D.
Xue, X.; LIN, Y.-P.; FENG, Y.; SO, H.-C.
Show abstract
BackgroundA bidirectional relationship has been observed between COVID-19 and respiratory disorders, where respiratory comorbidities increase severity and COVID-19 induces respiratory sequelae. The underlying biological and genetic mechanisms remain unclear. While previous studies have identified overlapping genetic loci, few have systematically disentangled the genetic factors shared between these conditions versus those specific to COVID-19, particularly at a multi-omics level. MethodsWe developed and applied a unified analytical framework to compare three COVID-19 phenotypes with eight respiratory disorders (including asthma, COPD, IPF, and pneumonia). Utilizing the cofdr method for shared genetic signal analysis and DDx/mtCOJO for differentiation, we integrated genome-wide association statistics with multi-omics data (transcriptome, splicing, and proteome). This approach allowed for the simultaneous identification of shared genetic signals (concordant or discordant) and disease-specific variants across expression (TWAS), alternative splicing (spTWAS), and protein abundance (PWAS). ResultsWe delineated a comprehensive atlas of 214 differential and numerous shared loci across 24 pairwise comparisons. The shared genetic architecture was characterized by pleiotropic effects in genes such as ATP11A (exhibiting opposing effects in COVID-19 vs. IPF) and GSDMB (shared with COPD). Crucially, differentiation analysis revealed that severe COVID-19 is genetically distinct from other respiratory infections (e.g., pneumonia and influenza) through dysregulated Type I/III interferon signaling and specific defects in alveolar epithelial and macrophage function, as well as GM-CSF/surfactant metabolism pathways. These findings provide direct genetic evidence supporting the use of GM-CSF modulators and interferon-lambda for COVID-19 treatment, therapies that have already entered clinical trials. Furthermore, multi-trait conditional analysis prioritized FYCO1 and HCN3 as potential COVID-19-specific risk genes. Splicing analysis underscored the critical role of alternative splicing in both shared and differential architectures, highlighting IFNAR2 isoform regulation as a key discriminator between COVID-19 and other respiratory traits. ConclusionThis study provides the first genome-wide, multi-omics map revealing the shared and differential genetic landscapes of COVID-19 and other respiratory phenotypes. By uncovering specific molecular mechanisms that distinguish COVID-19 pathology, specifically involving surfactant homeostasis and interferon pathways, our findings offer novel insights for targeted drug repurposing and precision risk stratification.
Neurgaonkar, P.; Dierolf, M.; O'Gorman, L.; Remmele, C.; Schaeffer, J.; Popp, I.; Borst, A.; Rost, S.; Ankenbrand, M.; Kratz, C.; Bergmann, A.; Kalb, R.; Yu, J.
Show abstract
MotivationFanconi anemia (FA) is a rare disease mainly caused by biallelic pathogenic variants, including structural variants such as large deletions and insertions in FA genes. Currently, variant detection is based on short-read sequencing and probe-based approaches. However, determining the exact genomic breakpoint or achieving allelic discrimination remains challenging. Nanopore-based long-read sequencing enables a comprehensive detection of FA variants, but a unified bioinformatic analysis platform for these data is missing. ResultsWe present FA-NIVA (Fanconi anemia - Nanopore Indel and Variant Analysis), an automated and adaptable analysis workflow tailored for Nanopore-based long-read sequencing data in FA genetic analysis. FA-NIVA integrates state-of-the-art tools to comprehensively detect both single nucleotide variants (SNVs) and structural variants (SVs). Our analysis platform enhances genotyping accuracy for biallelic variants by a joint SNV-SV based phasing in FA associated genes. Built within the Nextflow ecosystem and powered by containerized Docker images, FA-NIVA ensures reproducibility, flexibility, scalability and transparency across different computing environments. Together, FA-NIVA provides a robust end-to-end solution for the automated analysis of SVs and SNVs and high-resolution phasing analysis in FA genes, enabling an accurate and efficient pipeline for genetic analysis. AvailabilityFA-NIVA is available on GitHub at: https://github.com/UKWgenommedizin/FA-NIVA.
Solomon, D. H.; Santacroce, L.; Giles, J.; Rist, P. M.; Everett, B. M.; Liao, K. P.; Paudel, M.; Shadick, N. A.; Weinblatt, M. E.; Bathon, J. M.; Demler, O. V.
Show abstract
BackgroundCardiovascular (CV) disease risk is increased in rheumatoid arthritis (RA) and is the leading cause of mortality. Improved CV risk stratification tools in RA could enhance use of preventative care and improve outcomes. MethodsWe previously studied biomarkers of CV disease - adiponectin, hsCRP, Lp(a), osteoprotegerin (OPG), high-sensitivity cardiac troponin T (hsTnT), serum amyloid A (SAA), YKL-40, soluble TNF receptor1 (sTNFR1) -- that were associated with CV risk. In the current study, these biomarkers were tested in an unrelated external cohort of RA patients followed at a single academic medical center without a history of CV events. CV events were identified through Medicare and Medicaid administrative data or through medical record review of self-reported events. Biomarkers were assessed at cohort entry among a nested cohort of cases and controls, matched 1:1 on sex and age. Analyses were conducted using conditional logistic regression. We examined whether the candidate biomarkers added to clinical CV risk factors improved model prediction, using the area under the curve (AUC) as well as the net reclassification index (NRI). ResultsFrom a cohort of 1,345 eligible patients with RA, we identified 123 patients with confirmed CV events. Cases and matched controls were typical of RA: median age 63 years, 77% women, RA disease duration 11 years, 72% seropositive, 85% used a biologic or conventional disease modifying anti-rheumatic drug, 58% non-steroidal anti-inflammatory drugs, and 30% oral glucocorticoids. From the candidate biomarkers, LASSO regression selected hsTnT and sTNFR1 as associated with CV events. The AUC for models that included only clinical risk factors was 0.758 (95% CI 0.689-0.829); after adding hsTnT and sTNFR1, the AUC increased to 0.802 (95% CI 0.718-0.998). The NRI of the model with biomarkers was 16.3%, with improvement only observed in patients who did not have CV events during follow-up. ConclusionsAdding selected biomarkers to clinical risk factors enhances the discrimination of models predicting CV events among patients with RA. These risk models require prospective testing to see if they have value in clinical practice decision-making regarding preventative care.
Enwere, M.; Turiello, R.; Foo, J.; Nouwairi, R.; McElroy, J. H.; Medearis, E.; Smith, D.; Laurell, N.; Clayton, A.; Yarlagadda, A.; Aitchison, K.; Venton, B. J.; Landers, J. P.
Show abstract
Specific drug metabolism rates are defined by the constituency of the cytochrome P450 (CYP) genome, including polymorphic changes in any of 200+ CYP genes. An example is CYP2C19, where associations of gene polymorphisms with variability in certain drug metabolism rates have been linked to inter-individual and inter-ethnic differences in therapeutic outcomes. While pharmacogenomic screening for these variants prior to drug and dosage prescription has well-defined links to better treatment outcomes, current implementation is limited to complex and costly variant-probing and DNA sequencing protocols, which have limited availability in clinical laboratories, leading to slow turnaround times, impacting effective clinical intervention. Here we describe a novel, cost-effective, multiplex genotyping approach to screening CYP2C19 variants. Fluorescence nested allele-specific (FAS) PCR was used with primers to detect CYP2C19 variants of interest in specific hot spots, including the Tier 1 haplotypes identified by the Association for Molecular Pathology (AMP): CYP2C19*2, *3, and *17. The presence/absence of wild-type and mutant alleles were identified independently as haplotypes, and in a multiplex reaction as diplotypes representing the 10 possible genotype combinations/profiles. FAS-PCR achieved the same genotype calls as a pyrosequencing protocol optimized for validating genotypes, but with a simpler and more sensitive interface. The FAS-PCR method correctly identified the genotypes of both synthesized DNA and a human genomic DNA standard. Uniquely, the FAS-PCR protocol generates patterns using one fluorescently-labeled primer irrespective of the number of variants targeted, establishing it as considerably more cost-effective than other allele-specific PCR-based techniques that involve labeling both the forward and reverse primers.
Orkild, M. R.; Dybdahl, K. L.; Duun Rohde, P. D.
Show abstract
Inflammatory bowel disease (IBD) frequently co-occurs with immune-mediated and metabolic disorders, but whether these associations reflect shared genetics or causal effects remains unclear. We performed two-sample Mendelian randomization (MR) using large-scale genome-wide association study (GWAS) summary statistics to investigate potential causal effects of immune-mediated diseases and lifestyle traits on IBD, Crohns disease (CD), and ulcerative colitis (UC). SNP-based heritability and genetic correlations were estimated to contextualize findings. Following false discovery rate correction, genetically predicted psoriasis was positively associated with IBD (OR 1.15), CD (OR 1.23), and UC (OR 1.10), with the strongest effect observed for CD. Genetically predicted type 2 diabetes mellitus (T2DM) showed a modest inverse association with UC (OR 0.88). No lifestyle-related traits remained significant after correction. Sensitivity analyses indicated heterogeneity across instruments and evidence of directional pleiotropy in selected models, whereas no pleiotropy was detected for the T2DM-UC association. These findings support a role of psoriasis-related immune pathways in IBD susceptibility and suggest a potential inverse association between genetic liability to T2DM and UC.
Shouma, A.; Giannoudi, M.; Conning-Rowland, M.; Drozd, M.; Brown, O. I.; Cheng, C. W.; Sukumar, P.; Bridge, K. I.; Levelt, E.; Bailey, M. A.; Griffin, K. J.; Kearney, M. T.; Cubbon, R. M.
Show abstract
ObjectiveDiabetes mellitus (DM) approximately doubles the risk of atherosclerotic cardiovascular disease (ASCVD) events, but the molecular basis is poorly understood. We aimed to define arterial differentially expressed genes (DEGs) associated with DM, validate hits as plasma proteins, and ascertain whether these complement ASCVD risk prediction tools. Research design and methodsRNA-sequencing data from the Genotype-Tissue Expression (GTEx) cohort was used to define DEGs associated with DM in two arterial sites in >90 people with DM and >330 controls. UK Biobank (UKB) was used to corroborate that DEGs in their plasma protein form were differentially abundant in people with DM and associated with ASCVD events. Finally, we assessed if including these plasma proteins improved performance of the SCORE2 and SCORE2-Diabetes ASCVD risk models. Results619 and 356 DEGs were associated with DM in the thoracic aorta and tibial artery, respectively. Of these, 22 were common to both arteries, all of which were directionally concordant. Of these, 5 were included in the UKB plasma proteomics dataset and we corroborated 4 (ACP5, LEFTY2, LILRA5 and PSME2) as showing concordant differential abundance in people with DM; all demonstrated associations with a range of incident ASCVD events. Addition of the 4 proteins to SCORE2 and SCORE2-Diabetes (for people without and with DM, respectively) improved the population-level discrimination, classification and calibration of these models. ConclusionsDM is associated with a distinct arterial gene expression profile, hits from which are associated with ASCVD events and add value to risk prediction. Visual abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=139 SRC="FIGDIR/small/26345847v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@19e0228org.highwire.dtl.DTLVardef@9fd086org.highwire.dtl.DTLVardef@3315f0org.highwire.dtl.DTLVardef@1e5770f_HPS_FORMAT_FIGEXP M_FIG C_FIG
Soplenkova, A.; Maslov, D.; Timoshchuk, A.; Kifer, D.; Cvetko, A.; Georges, M.; Steves, C. J.; Menni, C.; Sharapov, S.; Lauc, G.; Aulchenko, Y. S.
Show abstract
The genetic regulation of the plasma N-glycome variation in human populations is not fully characterized, partly due to the limited sample size in glyco-genetics studies. Here, we aimed to demonstrate that protein-specific N-glycan profiles, like those of immunoglobulin G (IgG), can be accurately reconstructed from the total plasma N-glycome (TPNG), enabling us to find new regulators of this complex process re-analysing existing datasets. By testing multiple linear and non-linear machine learning approaches we built a model to reconstruct IgG N-glycans from TPNG data, training on the TwinsUK cohort and validating on CEDAR. We reconstruct GWAS summary statistics for IgG N-glycans by applying the trained linear model to plasma glycan GWAS summary statistics, i.e., as GWAS of linear combinations of plasma glycan traits. The majority of the identified loci had been implicated in IgG N-glycosylation GWAS. Additionally, we found four new loci and suggested the role of FCRLA, KDELR2, HHEX, and TCF3 in the regulation of IgG N-glycosylation. In conclusion, we showed that our method enables the creation of protein-specific N-glycome datasets, allowing for powerful meta-analyses without the need to profile new samples.
UPPALURI, K. R.; CHALLA, H. J.; VEMPATI, K. K.; KADALI, L. N.; PALASAMUDRAM, K.; RAYALA, M.
Show abstract
Coronary artery disease (CAD) is a multifactorial condition influenced by genetic, phenotypic, and environmental factors. Traditional risk prediction models fall short in capturing the polygenic complexity of CAD, particularly in underrepresented populations. This study presents SIGMA (Scoring Importance of Genes specific to disease using Machine learning Algorithms), a novel AI-powered framework that enhances CAD risk prediction by integrating genomic and phenotypic data. Our approach leverages GEMS (GeneConnectRx Evidence Metrics), an LLM-driven system to score 1772 CAD-associated genes, and CASCADE (Comprehensive Assessment of Sequence and Clinical Annotation Data Evaluation), a tiered variant scoring pipeline. Using whole exome sequencing (WES) data from 1,243 individuals (628 controls, 615 CAD cases), the model integrates age and gender as key non-modifiable phenotypes. Results show significant improvements in sensitivity (from 0.41 to 0.79), specificity (0.70 to 0.72), and AUC (0.59 to 0.81) when phenotype data are incorporated. Our findings highlight the potential of AI-integrated genomics for population-specific CAD risk stratification.